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Abstract — We consider compressed sensing of block-sparse sig- 
nals, i.e., sparse signals that have nonzero coefficients occurring 
in clusters. An uncertainty relation for block-sparse signals is de- 
rived, based on a block-coherence measure, which we introduce. 
We then show that a block-version of the orthogonal matching 
pursuit algorithm recovers block fc-sparse signals in no more than 
k steps if the block-coherence is sufficiently small. The same 
condition on block-coherence is shown to guarantee successful 
recovery through a mixed £2/^1 -optimization approach. This 
complements previous recovery results for the block-sparse case 
which relied on small block-restricted isometry constants. The 
significance of the results presented in this paper lies in the fact 
that making explicit use of block-sparsity can provably yield 
better reconstruction properties than treating the signal as being 
sparse in the conventional sense, thereby ignoring the additional 
structure in the problem. 



I. Introduction 

The framework of compressed sensing is concerned with 
the recovery of an unknown vector from an underdetermined 
system of linear equations |[T1. El. The key property exploited 
for recovery of the unknown data is the assumption of sparsity. 
More concretely, denoting by x an unknown vector that is 
observed through a measurement matrix D according to y = 
Dx, it is assumed that x has only a few nonzero entries. A 
fundamental observation is that if D is chosen properly and x 
is sufficiently sparse, then x can be recovered from y = Dx, 
irrespectively of the locations of the nonzero entries of x, even 
if D has far fewer rows than columns. This result has given 
rise to a multitude of different recovery algorithms which can 
be proven to recover a sparse vector x under a variety of 
different conditions onD 0, g), 0, [Q, 0. 

Two widely studied recovery algorithms are the basis pursuit 
(BP), or l\ -minimization approach Q, HI, and the orthogonal 
matching pursuit (OMP) algorithm [ 8 1 . One of the main tools 
for the characterization of the recovery abilities of BP is 
the restricted isometry property (RIP) JTJ, 0. Specifically, if 
the measurement matrix D satisfies the RIP with appropriate 
restricted isometry constants, then x can be recovered by 
BP. Unfortunately, determining the RIP constants of a given 
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matrix is in general an NP-hard problem. A more simple 
and convenient way to characterize recovery properties of a 
dictionary is via the coherence measure iflOl . ifTTI . 0. It was 
shown in [5 1, [12] that appropriate conditions on the coherence 
guarantee that both BP and OMP recover the sparse vector 
x. The coherence also plays an important role in uncertainty 
relations for sparse signals ifTUl . IfTTI . Ifl3l . 

In this paper, we consider compressed sensing of sparse 
signals that exhibit additional structure in the form of the 
nonzero coefficients occurring in clusters. Such signals are 
referred to as block-sparse lfT4l . Ifl5ll . Our goal is to explicitly 
take this block structure into account, both in terms of the 
recovery algorithms and in terms of the measures that are 
used to characterize their performance. The significance of 
the results we obtain lies in the fact that making explicit 
use of block-sparsity can provably yield better reconstruction 
properties than treating the signal as being sparse in the 
conventional sense, thereby ignoring the additional structure 
in the problem. 

Block-sparsity arises naturally, e.g., when dealing with 
multi-band signals ifTBTl . Ifl7l . Ifl8l or in measurements of 
gene expression levels fl9l . Another interesting special case of 
the block-sparse model appears in the multiple measurement 
vector (MMV) problem, which deals with the measurement of 
a set of vectors that share a joint sparsity pattern EOl . EH . 
E2ll . lO, EH. Furthermore, it was shown in HU, [B] that 
the block-sparsity model can be used to treat the problem of 
sampling signals that lie in a union of subspaces E4ll . [25 1, 

ma, (no, ma, mo, ma. 

One approach to exploiting block-sparsity is by suitably 
extending the BP method, resulting in a mixed l^jtx-wxm 
recovery algorithm [14|, E71 . It was shown in [14 | that if D 
has small block-restricted isometry constants, which general- 
izes the conventional RIP notion, then the mixed norm method 
is guaranteed to recover any block-sparse signal, irrespectively 
of the locations of the nonzero blocks. Furthermore, recovery 
will be robust in the presence of noise and modeling errors 
(i.e., when the vector is not exactly block-sparse). It was 
also established in [14J that certain random matrices satisfy 
the block RIP with overwhelming probability, and that this 
probability is substantially larger than that of satisfying the 
standard RIP. In ESI extensions of the CoSaMP algorithm 
E9ll and of iterative hard thresholding 1 30 1 to the model-based 
setting, which includes block-sparsity as a special case, are 
proposed and shown to exhibit provable recovery guarantees 
and robustness properties. 

The focus of the present paper is on developing a parallel 
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line of results by generalizing the notion of coherence to the 
block setting. This can be seen as extending the program 
laid out in [5|, |[T2ll to the block-sparse case. Specifically, we 
define two separate notions of coherence: coherence within 
a block, referred to as sub-coherence and capturing local 
properties of the dictionary, and block-coherence, describing 
global dictionary properties. We will show that both coherence 
notions are necessary to characterize the essence of block- 
sparsity. We present extensions of the BP, the matching pursuit 
(MP), and the OMP algorithms to the block-sparse case and 
prove corresponding performance guarantees. 

We point out that the term block-coherence was used 
previously in [31] in the context of quantifying the recovery 
performance of the MP algorithm in block-incoherent dictio- 
naries. Our definition pertains to block- versions of the MP and 
the OMP algorithm and is different from that used in (31]. 

We begin, in Section [llj by introducing our definitions 
of block-coherence and sub-coherence. In Section [III] we 
establish an uncertainty relation for block-sparse signals, and 
show how the block-coherence measure defined previously 
occurs naturally in this uncertainty relation. In Section IV 
we introduce a block version of the OMP algorithm, termed 
BOMP, and of the MP algorithm [8], termed BMP, and 
find a sufficient condition on block-coherence that guarantees 
recovery of block fc-sparse signals through BOMP in no more 
than k steps as well as exponential convergence of BMP. The 
same condition on block-coherence is shown to guarantee 
successful recovery through the mixed ^2/^1 optimization 
approach. The BOMP algorithm can be viewed as an extension 
of the subspace OMP method for MMV systems (23]. The 
proofs of our main results are contained in Section [V] A dis- 
cussion on the performance improvements that can be obtained 
through exploiting block-sparsity is provided in Section fVT 
Corresponding numerical results are reported in Section |VII 
We conclude in Section IVffll 

Throughout the paper, we denote vectors by boldface lower- 
case letters, e.g., x, and matrices by boldface uppercase letters, 
e.g., A. The identity matrix is written as I or 1^ when the 
dimension is not clear from the context. For a given matrix A, 
A T , A H , and Tr(A) denote its transpose, conjugate transpose, 
and trace, respectively, A^ is the pseudo inverse, 1Z(A) 
denotes the range space of A, Ajj is the element in the ith row 
and jth column of A, and stands for the £th column of A. 
The £th element of a vector x is denoted by xg. The Euclidean 
norm of the vector x is ||x||2 = Vx ff x, ||x||i = J2e \ x t\ * s 
the ^i-norm, HxH^ = maxi \x?\ is the ^-norm, and ||x||o 
designates the number of nonzero entries in x. The Kronecker 
product of the matrices A and B is written as A eg) B. The 
spectral norm of A is denoted by p(A) = Amax(A ff A), where 
A max (B) is the largest eigenvalue of the positive-semidefinite 
matrix B. 

II. Block-Sparsity and Block-Coherence 

A. Block-sparsity 

We consider the problem of representing a vector y 6 C L 
in a given dictionary D of size L x N with L < N, so that 

y = Dx (1) 



for a coefficient vector x 6 C N . Since the system of equations 
([T]i is underdetermined, there are, in general, many possible 
choices of x that satisfy ([T| for a given y. Therefore, further 
assumptions on x are needed to guarantee uniqueness of the 
representation. Here, we consider the case of sparse vectors x, 
i.e., x has only a few nonzero entries relative to its dimension. 
The standard sparsity model considered in compressed sensing 
[ 1 1, [2] assumes that x has at most k nonzero elements, which 
can appear anywhere in the vector. As discussed in |28|, 
fl4l . ifTBI there are practical scenarios that involve vectors 
x with nonzero entries appearing in blocks (or clusters) rather 
than being arbitrarily spread throughout the vector. Specific 
examples include signals that lie in unions of subspaces [25 1, 
d, HI, Eg), and multi-band signals lfl6l. ifTTl. Ifl8l. 

The recovery of block-sparse vectors x from measurements 
y = Dx is the focus of this paper. To define block-sparsity, 
we view x as a concatenation of blocks — assumed throughout 
the paper to be of length d — with ~x[£] denoting the £th block, 
i.e., 



Xd x d +i 



X2d 




XN-d+l 



XN\ 



(2) 



x T [M] 



where TV = Aid. We furthermore assume that L — Rd with R 
integer. Similarly to pi, we can represent D as a concatenation 



of column-blocks D\£] of size L x d: 



D = [di_ 



d rf d 



d+l 



12</ 



lN-d+1 



IN 



(3) 



D[l] 



D[21 



D [M] 



A vector x e C N is called block fc-sparse if x[£] has 
nonzero Euclidean norm for at most fc indices I. When d = 1, 
block-sparsity reduces to conventional sparsity as defined in 
CD, El- Denoting 



M 



X 3,0 = 



= > J(||X0|| 3 >O) 



(4) 



with the indicator function /(•), a block fc-sparse vector x is 
defined as a vector that satisfies ||x|| 2 ,o < k. In the remainder 
of the paper conventional sparsity will be referred to simply 
as sparsity, in contrast to block-sparsity. 

We are interested in providing conditions on the dictionary 
D ensuring that the block-sparse vector x can be recovered 
from measurements y of the form ([T| through computationally 
efficient algorithms. Our approach is partly based on Q, 
fPTl . lfT2l (and the mathematical techniques used therein) 
where equivalent results are provided for the sparse case. The 
two algorithms investigated are BOMP and a mixed £2/^1- 
optimization program (referred to as L-OPT [ 14]). It was 
shown in [ 14] that L-OPT yields perfect recovery if the dic- 
tionary D satisfies appropriate restricted isometry properties. 
The purpose of this paper is to provide recovery conditions 
for BOMP and L-OPT based on a suitably defined measure 
of block-coherence. We will see that block-coherence plays a 
role similar to coherence in the case of conventional sparsity. 

Before defining block-coherence, we note that in order to 
have a unique block fc-sparse x satisfying ([T]i it is clear that 
we need R > k and the columns within each block D [I] , £ = 
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1,2, ...,M, need to be linearly independent. More generally, 
we have the following proposition taken from |14|. 

Proposition 1. The representation (j7| is unique if and only if 
Dg ^ for every g 7^ that is block Ik-sparse. 

From Proposition [T] the columns of D [£] are linearly inde- 
pendent for all I. Throughout the paper, we assume that the 
dictionaries we consider satisfy the condition of Proposition 
[T] and, furthermore, ||d r || 2 = 1, r = 1,2, ...,iV. 

B. Block-coherence 

The coherence of a dictionary D measures the similarity 
between basis elements, and is defined by [10|, ifTTl . 



max |d?d r 

£,r=££ 



(5) 



This definition was introduced in [8 1 to heuristically character- 
ize the performance of the MP algorithm, and was later shown 
to play a fundamental role in quantifying recovery thresholds 
for the OMP algorithm and for BP [5]. The coherence p, 
furthermore occurs in l\ -uncertainty relations relevant in the 
context of decomposing a vector into two orthonormal bases 
[10 1, 111 J. A definition of coherence for analog signals, along 
with a corresponding uncertainty relation, is provided in lfl3l . 

It is natural to seek a generalization of coherence to the 
block-sparse setting with the resulting block-coherence mea- 
sure having the same operational significance as the coherence 
fi in the sparse case. Below, we propose such a generalization, 
which is shown — in Sections III and [TV] — to occur naturally 
in uncertainty relations and in recovery thresholds for the 
block-sparse case. 

We define the block-coherence of D as 



with 



M[£,r] = T> H [£]T>[ 



(6) 



(7) 



Note that M[£,r] is the (£,r)th d x d block of the N x N 
matrix M = T) H T). When d = 1, as expected, /ib = /■<■ 
While hb quantifies global properties of the dictionary D, 
local properties are described by the sub-coherence of D, 
defined as 



v = max max | d^dj | , 



d t ,d 3 6D[<], 



(8) 



We define v = for d = 1. In addition, if the columns of 
T)[£] are orthonormal for each £, then v = 0. 

Since the columns of D have unit norm, the coherence /j, 
in |5]) satisfies /i 6 [0, 1] and therefore, as a consequence of 
v € [0,/x], we have v £ [0,1]. The following proposition 
establishes the same limits for the block-coherence /ib, which 
explains the choice of normalization by \/d in the definition 

# 

In the remainder of the paper conventional coherence will be 
referred to simply as coherence, in contrast to block-coherence 
and sub-coherence. 

Proposition 2. The block-coherence satisfies < /is < M- 



Proof: Since the spectral norm is non-negative, clearly 
Mb > 0. To prove that /ib < /i, note that the entries of M[£, r] 
for £ 7^ r have absolute value smaller than or equal to /i. It 
then follows that 

1 

= max — 

i,rjte d 



\ max (M»[£,r]M[£,r}) 



< max — 

e,r^e d 



< max - 

l,r=£l d 



\ 



maxV \(M H [£,r]M[£,r])i 

J = l 



(9) 



\ 



max c?/i 2 



(10) 



where |9]) is a consequence of Gersgorin's disc theorem ( l32l 
Corollary 6.1.5]). ■ 
From \i < 1, with Proposition [2] it now follows trivially 
that fiB < 1- 

When the columns of D [I] are orthonormal for each £, we 
can further bound /ib- 

Proposition 3. If D consists of orthonormal blocks, i.e., 
T> H [£]T>[£] = l d for all £, then fi B < l/d. 

Proof: Using the submultiplicativity of the spectral norm, 
we have 

MB = ™^ P(MM) 
= maxip(D ff [£]D[r]) 

<maxL(D H [£])p(D[r}) 

-\ 

where {□} follows from D H [^]D[£] = l d , for all £, 
A max (D ff [W]) = X m ^CD[£]r> H [£}), and A max (I d ) = 1 
combined with the definition of the spectral norm. 



III. Uncertainty Relation for Block-Sparse 
Signals 

We next show how the block-coherence /ib defined above 
naturally appears in an uncertainty relation for block-sparse 
signals. This uncertainty relation generalizes the corresponding 
result for the sparse case derived in ifTol . fffl . 

Uncertainty relations for sparse signals are concerned with 
representations of a vector x € C L in two different orthonor- 
mal bases for C L : {cj) e , 1 < £ < L} and {i/> e , 1 < £ < L} 
1 10], IfTTl . Any vector x S C L can be expanded uniquely in 
terms of each one of these bases according to: 



(12) 
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The uncertainty relation sets limits on the sparsity of the 
decompositions (12i for any x 6 C L . Specifically, denoting 
A = ||a||o and B = ||b||o, it is shown in ifTTI that 



1 



(A + B) > VAB > 



1 



(13) 



where /i(<i>,SI/) is the coherence between <I> and SI/, defined 



as 



/.*(*, *) = max|</>f ip r 



(14) 



It is easily seen that for D consisting of the orthonormal bases 
<I> and SI/, i.e., D = [4> SI/], we have ^(Sf>, SI/) = ^, where /j, 
is as defined in ^ and associated with D = [<I> SI/]. 

In HO) it is shown that 1/vT < //(*,M>) < 1. The upper 
bound follows from the Cauchy-Schwarz inequality and the 
fact that the basis elements have norm 1. The lower bound is 
obtained as follows: The matrix M = $ H \J> is unitary so that 
EtiEtiK^,! 2 = Tr(M^M) = Tr(Ii) = L. Con- 
sequently, we have L 2 maxf T \d>fxp r \ 2 > L which implies 
\&) > 1/vL. This lower bound can be achieved, for 
example, by choosing the two orthonormal bases <I> and SI/ as 
the spike (identity) and Fourier bases [10|. With this choice, 
the uncertainty relation ( fT~3] > becomes 

A + B > 2VAB > 2VZ. (15) 



When v L is an integer, the relations in ( 15 i can all be satisfied 
with equality by choosing x as a Dirac comb 8^ with spacing 
\J L, resulting in \fL nonzero elements. This follows from the 
fact that the Fourier transform of is also S^. 

We now develop an uncertainty relation for block-sparse 
decompositions. Specifically, we derive a result that is equiva- 
lent to ( p~3] > with A and B replaced by block-sparsity levels as 
defined in Q, and ^(4>, S&) replaced by the block-coherence 
between the orthonormal bases considered, and defined below 
in ([18]). 

Theorem 1. Let Sf>,SI/ be two unitary L x L matrices with 
L x d blocks {*[£], &[£], 1 < £ < R} and let x e C L satisfy 

\b[(\. 

e=i 

12,0 and B 

1 



R 

E 



Let A 



b || 2,0- Then, 



(16) 



2^ 



B) > VAB > 



^b(*,*) 



where 



Ai B (*,*) =maxip(* ff [£]*[r]). 
d 



t.r 



(17) 



(18) 



Note that for D consisting of the orthonormal bases <I> and 
SI/, i.e., D = [<I> SI/], we have ^b(*, = Mb, where /xb is 
as defined in (|6]) and associated with D = [<I> SI/]. 

Proof: Without loss of generality, we assume that ||x||2 = 



1. Then, 



E 1 

R 

E 



• > |a- 

t,r=l 



]Afcr]b[r] 



]Afcr]b[r] 



(19) 



(20) 



where we set A[£, r] = <& H [^]SI/[r]. Now, from the Cauchy- 
Schwarz inequality, for any a, b, 

|a H A[£,r]b| < ||b|| 2 || A H [£, r]a|| a 

<A^(AMA ff M)l|b|| 2 ||a|| 2 

< d/i B ||b|| a ||a|| a (21) 

where, for brevity, we wrote /ib = Mb^,*)- Substituting 



into ( 19 1, we get 

Ft R 

l<^B^||b[r]|| 2 ^||aM|| 2 . 

r=l e=i 

Applying the Cauchy-Schwarz inequality yields 

1/2 

X>[r]||a<V^(f>[r]||2j = 

r=l \r=l / 



(22) 



I? 



(23) 



bill = 1 



where we used the fact that Yl r =i 1 1 M?"] II I 
since ||x||| = 1 and SI/ is unitary. Simi larly , we have that 
Il a [ r ]ll2 < \[A. Substituting into \n\ and using the 
inequality of arithmetic and geometric means completes the 
proof. ■ 
The bound provided by Theorem [T] can be tighter than that 
obtained by applying the conventional uncertainty relation ( 1 3 1 



to the block-sparse case. This can be seen by using ||a||o < 

(24) 



d||a||2,o and ||b||o < d||b|| 2 ,o in (13i to obtain 

1 



|a|| 2 .ol|b||2,o > 



d/i 



Since fiB < A*, this bound may be looser than ( 17 1 



A. Block-incoherent dictionaries 

As already noted, in the sparse case (i.e., d — 1) for any two 
orthonormal bases 4> and SI/, we have /i > \j\J~L. We next 
show that the block-coherence satisfies a similar inequality, 
namely /ib > l/Vd~L. 



Proposition 4. The block-coherence ( |18| l satisfies /ib > 
1/VdL. 

Proof: Let Sf> and SI/ be two orthonormal bases for C L 
and let A = * H SI> with A[£,r] denoting the {£, r)th d x d 
block of A. With R = L/d, we have 

R R 1 

> E E w2 A max(A H [£, r]A[£, r]) 



-1 r=l 



f R R \ 

^ ^A max ^^A ff [^r]A[£,r] . (25) 

r=l / 

Now, it holds that 

J2 E A ^ r ] A ^ r i = E * ff m ( E ) *h- 

(26) 

Since Sf> is a square matrix consisting of orthonormal columns, 
we have Y^,f=i &[£]& H [£] = = II- Furthermore, since 
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i&M consists of orthonormal columns 
* H [r]*[r] =I d 



for each r, we have A. Block OMP and block MP 



Therefore, (|25|) becomes 
1 



/4> 



1 



(27) 



d 2 R dL 

which concludes the proof. ■ 
We now construct a pair of bases that achieves the lower 
bound in ( f2"7j ) and therefore has the smallest possible block- 
coherence. Let F be the DFT matrix of size R = L/d with 
F ir = (l/v A R)exp(j2 7 rfr/i?). Define * = I L and 



* = F <g> U d 



(28) 



where is an arbitrary d x d unitary matrix. For this choice, 



* ff [£]*[r] 
we get 



F^ r U(j. Since p(U d ) = 1 and |F^ 



Mb 



dVR 



dL 



l/VR, 



(29) 



When d = 1, this basis pair reduces to the spike-Fourier pair 
which is well known to be maximally incoherent ifTUl . 
When /is satisfies ( |29] > the uncertainty relation becomes 

(30) 



A + B > 2VAB > 2yfR. 



If v R is integer, the inequalities in ( 30 1 are met with equality 
for the signal x = ® c where c is an arbitrary nonzero 
length-<i vector. Indeed, in this case, the representation of x 
in the spike basis requires y/R blocks (of size d), so that 
|| a l|2,o = vR. The representation of x in the basis \I/ in (28 1 
is obtained as 



(F H ®\jX)(d^®c) = 6^®V%c (31) 



b = (F" U,'/)i<5. /5 ®c) = 5 

where we used the fact that the Fourier transform of 6 
is also S^jf. Therefore, b has 



R nonzero blocks so that 
|| t» ]| 2,0 — VR and hence A = B = \/R, which implies that 
all inequalities in < [30| ) are met with equality. 

IV. Efficient Recovery Algorithms 

We now give operational meaning to block-coherence by 
showing that if it is small enough, then a block-sparse signal 
x can be recovered from the measurements y = Dx using 
computationally efficient algorithms. We consider two differ- 
ent recovery methods, namely the mixed £2/^1 -optimization 
program (L-OPT) proposed in fl4l : 



M 

1=1 



s. t. y = Dx 



(32) 



and an extension of the OMP algorithm [8| to the block- 
sparse case described below and termed block-OMP (BOMP). 
We then derive thresholds on the block-sparsity level as a 
function of and v for both methods to recover the correct 
block-sparse x. For L-OPT this complements the results in 
lTT4l that establish the recovery capabilities of L-OPT under 
the condition that D satisfies a block-RIP with a small 
enough restricted isometry constant. For the special case of 
the columns of T>[£] being orthonormal for each £, we suggest 
a block- version of the MP algorithm J8|, termed block-MP 
(BMP). 



The BOMP algorithm begins by initializing the residual as 
ro = y. At the £th stage (£ > 1) we choose the block that is 
best matched to r^_i according to: 



argmax ||D ff [i]r^_i| 



(33) 



Once the index ig is chosen, we find x^[i] as the solution to 



y-£D[»]x«[*l 



(34) 



where X is the set of chosen indices < j < I. The residual 
is then updated as 



re = y -^D[i]x*[i] 

iei 



(35) 



In the special case of the columns of T>[£] being orthonormal 
for each £ (the elements across different blocks do not have 
to be orthonormal), we consider an extension of the MP 
algorithm to the block-case. The resulting algorithm, termed 
BMP, starts by initializing the residual as r = y and at the 
Ah stage (£ > 1) chooses the block that is best matched to 
r^_i according to ( |33| ). Then, however, the algorithm does 
not perform a least-squares minimization over the blocks that 
have already been selected, but directly updates the residual 
according to 



re-i 



D[i,lD 



(36) 



B. Recovery conditions 

Our main result, summarized in Theorems [2] and [3] below, 
is that any block fc-sparse vector x can be recovered from 
measurements y = Dx using either the BOMP algorithm or 
L-OPT if the block-coherence satisfies kd < (/ig 1 + d — (d — 
l)i'/ig 1 )/2. In the special case of the columns of T)[£] being 
orthonormal for each £, we have v = and therefore the 
recovery condition becomes kd < (/ig 1 + d)/2. In this case 
BMP exhibits exponential convergence rate (see Theorem [4]i. 
If the block-sparse vector x was treated as a (conventional) 
fcd-sparse vector without exploiting knowledge of the block- 
sparsity structure, a sufficient condition for perfect recovery 
using OMP or (|32j for d = 1 (known as BP) is kd < 
(/j.^ 1 + l)/2. Comparing with kd < (/ig 1 + d)/2, we can see 
that, thanks to /ig < fi, making explicit use of block-sparsity 
leads to guaranteed recovery for a potentially higher sparsity 
level. Later, we will establish conditions for such a result to 
hold even when v ^ 0. 

To formally state our main results, suppose that x is a 
length- N block fc-sparse vector, and let y = Dxo. Let Do 
denote the L x (kd) matrix whose blocks correspond to the 
nonzero blocks of xq, and let Do be the matrix of size L x 
(N — kd) which contains the L x d blocks of D that are not in 
Do- We then have the following theorem proved in Section [V] 



Theorem 2. Let x € 



be a block k-sparse vector with 



blocks of length d, and let y = Dxq for a given L x N 
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matrix D. A sufficient condition for the BOMP and the L-OPT 
algorithm to recover xo is that 

/9c(dJd ) < 1 (37) 



where 



p c (A) = max^p(A[^,r]) 



(38) 



and A[£, r] is the (£, r)th dx d block of A. In this case, BOMP 
picks up a correct new block in each step, and consequently 
converges in at most k steps. 



Note that 



p c (DjD ) =maxp c (Dj ) Do[r]). 



Therefore, ( [37} implies that for all r, 

Pc(dJdoH) < i. 



(39) 



(40) 



The sufficient condition ( [37} depends on Do and therefore 
on the location of the nonzero blocks in x , which, of 
course, is not known in advance. Nonetheless, as the following 
theorem, proved in Section [V] shows, ( 37 1 holds universally 
under certain conditions on pe and v associated with the 
dictionary D. 

Theorem 3. Let pb be the block-coherence and v the sub- 



coherence of the dictionary D. Then (37) is satisfied if 



1 



1) 



Mb 



(41) 



For d = 1, and therefore v = 0, we recover the corresponding 
condition k < (/i _1 + l)/2 reported in [5 1, 1 12|. In the special 
case where the columns of D[£] are orthonormal for each £, 
we have v = and ( |4T} becomes 



d). 



(42) 



The next theorem shows that under condition ( |42} , BMP 
exhibits exponential convergence rate in the case where each 
block D[£] consists of orthonormal columns. 

Theorem 4. If T> H [£]D[£] = l d , for all i, and kd < (/ig 1 + 
d) j% then we have: 

1) BMP picks up a correct block in each step. 

2) The energy of the residual decays exponentially, i.e., 



Nil < /3%o||| with 



0=1- 



l-(k- l)dm 



(43) 



V. Proofs of Theorems [2j [3] and [4] 

Before proceeding with the actual proofs, we start with 
some definitions and basic results that will be used throughout 
this section. 

For x G C N , we define the general mixed £ 2 /£ p -norm 
(p = 1, 2, oo here and in the following): 



X l|2,p = ||V| 



where v e = ||x[£l|| 2 



(44) 



and the x[£] are consecutive length-<i blocks. For an L x JV 
matrix A with L = Rd and N = Md, where R and M are 



integers, we define the mixed matrix norm (with block size d) 
as 

IIAx|| 2 ,p 



2.p = max- 

x#0 ||x|| 2 ,p 



(45) 



The following lemma provides bounds on ||A|j 2 .p, which 
will be used in the sequel. 

Lemma 1. Let A be an L x N matrix with L = Rd and 
N = Md. Denote by A[£,r] the (£,r)th d x d block of A. 
Then, 

||A|| 2 ,oo < max^p(A[£,r]) = p r (A) (46) 

r 

UAH,,! < max^p(A[£,r]) = p c (A). (47) 

l 

In particular, p r (A) = p c (A H ). 

Proof: See Appendix |A] ■ 



Lemma 2. p c (A) as defined in ([38} is a matrix norm and as 
such satisfies the following properties: 

• Nonnegative: p c (A) > 

• Positive: p c (A) = if and only if A = 

• Homogeneous: p c (aA) = \a\p c (A) for all a G C 

• Triangle inequality: p c {A + B) < p c (A) + p c (B) 

• Submultiplicative: p c (AB) < p c (A)p c (B). 

Proof: See Appendix [5] ■ 



A. Proof of Theorem g]/or BOMP 

We begin by proving that ( [37} is sufficient to ensure 
recovery using the BOMP algorithm. We first show that if r^_i 
is in T2.(Dq), then the next chosen index i% will correspond to a 
block in Do. Assuming that this is true, it follows immediately 
that %\ is correct since clearly ro = y lies in 7?.(Do). Noting 
that Yg lies in the space spanned by y and D[i],i G It, where 
Ii denotes the indices chosen up to stage I, it follows that if 
Tit corresponds to correct indices, i.e., D[i] is a block of Do 
for all i G Ti, then Y( also lies in TZ{T)q) and the next index 
will be correct as well. Thus, at every step a correct L x d 
block of D is selected. As we will show below no index will 
be chosen twice since the new residual is orthogonal to all the 
previously chosen subspaces; consequently the correct x will 
be recovered in k steps. 



We first show that if r^_i G 7?.(Do), then under (37 i the 



next chosen index corresponds to a block in D . This is 
equivalent to requiring that 



/ s l|Dp ri-i|| 2 ,oo 
'V'- 1 * = \\TiH ra II < L 

u r f-l|| 2 ,oo 



(48) 



From the properties of the pseudo-inverse, it follows that 
DoDq is the orthogonal projector onto 7£.(Do). Hence, it holds 
that DoDq^-i = r^_i. Since DqDJ is Hermitian, we have 



(D*)Xi-f-i = rn- 



(49) 
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Substituting (|49J into (gSJ yields 

*(r*-i = - 



DfCDt^Djfr^lh,, 



</v(Do (D ) ff ) 
Pc(DjD ) 



(50) 



where we used Lemma Q] 

It remains to show that BOMP in each step chooses a new 
block participating in the (unique) representation y = Dx. We 
start by defining T>i = [D[ii] ■ ■ • D[i^]] where ij £ I, 1 < 
j < £. It follows that the solution of the minimization problem 
in p4| ) is given by 

x=(DfD / )- 1 Dfy 



which upon inserting into ( [35} yields 

r* = (I-D,(DfD,)- 1 Df)y. 

Now, we note that (D| f D^)" 1 D| f is the orthogonal pro- 
jector onto the range space of D^. Therefore ||D H [i]r^ ||2 = 
for all blocks D[i] that lie in the span of the matrix D^. By the 
assumption in Proposition ([T]l we are guaranteed that as long 
as I < k there exists at least one block (in Do) which does not 
lie in the span of T>g. Since this block (or these blocks) will 
lead to strictly positive ||D H [i]r^||2 the result is established. 
This concludes the proof. 

B. Proof of Theorem |2]/or L-OPT 

We next show that < (37] > is also sufficient to ensure recovery 
using L-OPT. To this end we rely on the following lemma: 

Lemma 3. Suppose that v £ C kd with ||v[£]|| 2 > 0, for all 
t, and that A is a matrix of size L x (kd), with L — Rd and 
the d x d blocks A[£,r\. Then, ||Av|| 2 ,i < p c (A)||v|| 2il . If 
in addition the values of p c (A3i) are not all equal, then the 
inequality is strict. Here, Jg is a (kd) x d matrix that is all 
zero except for the Ith d x d block which equals Id- 
Proof: See Appendix [C] ■ 
To prove that L-OPT recovers the correct vector Xo, let 
x' 7^ xo be another length-iV block fc-sparse vector for 
which y = Dx'. Denote by Co and c' the length-fcd vectors 
consisting of the nonzero elements of x and x', respectively. 
Let Do and D' denote the corresponding columns of D so that 
y = DoCo = D'c'. From the assumption in Proposition [T] 
it follows that there cannot be two different representations 
using the same blocks Dq. Therefore, D' must contain at least 



one block, Z, that is not included in Do- From (40i, we get 
P c (DqZ) < 1. For any other block U in D, we must have 
that 

Pc (DjU) < 1. (51) 

Indeed, if U 6 Dn, then U = T>q[£] = DoJf where 3t was 
defined in Lemmapl In this case, DqDoM = Je and therefore 
/5 c (DjU) = p c (3 e ) = 1. If, on the other hand, U = T>[£] for 



Now, suppose first that the (kd) x d blocks in DjD' do 
not all have the sam^]p c . Then, 

II Co II 2,1 = ||D DoCo|| 2 ,i 



- l|D DV|| 2il 
< Pc (D D')||c'|| 24 
< llc'lki 



(52) 

(53) 
(54) 



where the first equality is a consequence of the columns of Do 
being linearly independent (a consequence of the assumption 
in Proposition [TJ, the first inequality follows from Lemma [3] 
since \ \c' \(\ || 2 > 0, for all £, and the last inequality follows 



from (51 1. If all the (kd) x d blocks in DqD' have identical 
p c , then the inequality (|53]> is no longer strict, but the second 



inequality (54i becomes strict instead as a consequence of 



p c (DjZ) < 1; therefore 



still holds. 



I <=o II 2,1 < He',, 
Since ||x || 2 ,i = ||c || 2 ,i and ||x'|| 2 ,i = ||c'|| 2) i, we 
conclude that under ( |40| , any set of coefficients used to 
represent the original signal that is not equal to xo will result 
in a larger ^/^-norm. 

C. Proof of Theorem [3] 

We start by deriving an upper bound on p c (DjD) in terms 



(55) 



of pb an d v - Writing Dq out, we have that 

Pc (D D) = p c ((D«D )- 1 D ff D). 
Submultiplicativity of p c (A) (Lemma implies that 
Pc (D D) < p c ((D^Do)- 1 )p c (D ^D) 

= p c ((D ^D )- 1 ) max V p(T> H [i]D[j]) (56) 



J^Ao 



ieA 



where Ao is the set of indices £ for which D[£] is in Do- Since 
A contains k indices, the last term in ( |56"1 > is bounded above 
by kd/iB, which allows us to conclude that 



p c (D D) < ^((D^DoJ-^fcdMB- 



(57) 



It remains to develop a bound on p c ((D^Do) _1 ). To this 
end, we express D^Do as D^Do = I + A, where A is a 
(kd) x (kd) matrix with blocks A[£, r] of size d x d such 
that Aj i = 0, for all i. This follows from the fact that the 
columns of A are normalized. Since A[£,r] = D^[^]Do[r], 
for all £ ^ r, and A[r, r] — T>q [r]Do[r] — Id, we have 

p c (A) = maxV/i(A[f,r]) 

r * — ' 

e. 

< max p( A [r,r]) + max p( A [t,r]) (58) 

r r ^ — 4 

<(d-l)u+(k-l)dp B (59) 



some £, then it follows from (40 1 that p c (D U) < 1. 



where the first term in ( |59] l is obtained by applying Gersgorin's 
disc theorem ([32, Corollary 6.1.5]) together with the defini- 
tion of v\ the second term in (|59]l follows from the fact that the 
summation in the second term of ( |5"8j ) is over k — 1 elements 
and p(A[£ 1 r\), for all £ ^ r, can be upper-bounded by d/ie- 

'Note that for an (sd) X d matrix A, p c (A) = Yle P(-^-W)' wnere 
A.[£], £ = 1,2, s, denotes the d X d block of A made up of the rows 
{(£- i)d+ 1, ...,id}. 
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Assumption (41 1 now implies that (d — l)u+ (k— l)rf/iB < 1 
and therefore, from |59|, we have p c (A) < 1. 
We next use the following result. 

Lemma 4. Suppose that /9 C (A) < 1. Then (I + A) -1 = 
!Xo(-A) fc . ' 

Proof: Follows immediately by using the fact that /O c (A) 
is a matrix norm (cf. Lemma |2|l and applying [32, Corollary 
5.6.16]. ■ 
Thanks to Lemma |4j we have that 



Pc ((D^D )- 1 )=p c hT(- A ) A 



\/c=0 

oo 

<5> c (A)) fc 

fc=0 

1 

1 - Pc (A) 



< 



1 



l-(d - l)v- (fc-l)d/i B ' 



(60) 



(61) 



Here, ( |60| is a consequence of p c (A) satisfying the triangle 
inequality and being submultiplicative and ( |6Tj ) follows by 
using ( |59| . 

Combining ( |6Tj ) with ( |57j ), we get 



Pc(Dj,D) < 



1- (d-l)i/- (fc- l)d^ B 



< 1 



(62) 



where the last inequality is a consequence of (41 



Z). Proof of Theorem |4] 

The proof of the first part of Theorem [4] follows from the 
arguments in the proofs of Theorems [2] and [3] for v — 0. As 
a consequence of the first statement of Theorem |4j we get 
that the residual 17 in each step of the algorithm will be in 
1Z(Dq). For the proof of the second statement in Theorem |4] 
we mimic the corresponding proof in [33'] . We first need the 
following lemma, which is an extension of |[34l Lemma 3.5] 
to the block-sparse case. This lemma will provide us with a 
lower bound on the amount of energy that can be removed 
from the residual in one step of the BMP algorithm. 

Lemma 5. Let Do denote the L x (kd) matrix whose blocks 
correspond to the nonzero blocks o/xq. Then, we have 



max||Djf [i]r.e| 



> 



IMi 

c flki 



(63) 



where eg is the coefficient vector corresponding to ri ^ 0, 
i.e., = D C(i. 



Proof: We start by noting that 



D Q 



J2i=i Do[*]q [i], where C([i] ^ for at least one index 



i e {1,2, ...,k}. It follows that 



!r,|l2 = E c fM D o ff W^ 

fc 

<5>fMD ff Hr,| 

k 



<El|c,H|| 2 ||D ff [i]r,|[ 2 

k 

< (max\\B^[i\r e \\ 2 ^J2\\ c Mh- (64) 



i=l 

The result then follows by noting that Yli=i ||c£[i]|| 2 = 

IMki- ■ 

Next, we compute an upper bound on ||c^|| 2; i. Using 
M[i,j] = Df [i]D \j], where i, j G {1, . . . , kj, we get 

\\T t \\l = cf Df D c £ 

fe fc 

i=l j=l 



i=i 

k 



i=l j=l 



k k 



> 



E W c Ml "EE Icf HM[i,i]c,b]| (65) 



i=l 



i=l j=l 



where we used the fact that M[z,i] = 1^, for all i, as 
a consequence of each of the blocks of Do consisting of 
orthonormal vectors. Applying the Cauchy-Schwarz inequality 
to the second term in (|65]l, we get 



fe fc 



\\v t \\l > E EE l|c.[i]||2||M[i, j]c^-]|| 2 (66) 



i=l 3 = 1 



k fc 



> IMlL - EE H c 4*]l|2||c4j]||2dAiB (67) 



i=l j = l 



fc-i fc 



\^\\h ~ dm E E IM*]l|2lM(* + s)fc]|| 2 (68) 



where (i + s)fc stands for (i + s) modulo fc, ( |67] i follows 
from ||M[i, j]c4j]|| 2 < d/i B ||cf[j]||2, and (|68) is obtained 
by merely rearranging terms in the summation in (|67|). Ap- 



plying the Cauchy-Schwarz inequality to the inner product 

Y,i=i \\ce[i]h\\ce[(i + s) k ]h, we obtain 



fc-i 



\rtf 2 > IM|| >2 ~ E IMi.2 

s=l 

= (l-(fc-l)d MB )||Q||L 

(1 - (fc - l)dp B ) 2 

^ r IMki 



(69) 



(70) 
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where ( |70| i follows by the same argument as used in (|23 
Thus, combining d64]l with d70]l, we get 



(71) % 



Since, by the first statement in Theorem[4j BMP picks a block 
in D in each step, we can bound the energy of the residual 
in the (£ + l)st step as 

\G H ^+i]r e \\l 
- max||D^[i]r^| 

i 

(i - (fc - i)d(x B y 



xiii = llr^ni — 
= INH 



(72) 



< 1 



(73) 



where in (|72]> we used the fact that r^ +1 is orthogonal to 
D[i| + i]D ff [z^ + i]r£. This concludes the proof. 



(/^ x + rf)/2 



(M _1 + l)/2 



block size d 

Fig. 1 . Recovery thresholds for both block-sparsity and conventional sparsity 
for R = 10 as a function of d. 



VI. Discussion 

Theorem [3] indicates under which conditions exploiting 
block-sparsity leads to higher recovery thresholds than treating 
the block-sparse signal as a (conventionally) sparse signal. For 
dictionaries D where the individual blocks D[£] consist of 
orthonormal columns, for each £, we have v = and hence, 
thanks to /ib < M> recovery through exploiting block-sparsity 
is guaranteed for a potentially higher sparsity level. If the 
individual blocks T>[£] are, however, not orthonormal, we have 
v > 0, and ( pT) shows that v has to be small for block- 
sparse recovery to result in higher recovery thresholds than 
sparse recovery. It is now natural to consider the case where 
one starts with a general dictionary D and orthogonalizes 
the individual blocks D[l] so that v — 0. The comparison 
that is meaningful here is between the recovery threshold of 
the original dictionary D without exploiting block-sparsity 
and the recovery threshold of the orthogonalized dictionary 
taking block-sparsity into account. To this end, we start by 
noting that the assumption in Proposition [T] implies that the 
columns of T)[£] are linearly independent, for each I. We 
can therefore write T>[£] = A[f]Wf where A[£] consists of 
orthonormal columns that span 7?.(D[^]) and is invertible. 
The orthogonalized dictionary is given by the L x N matrix 
A with blocks A[£]. Since D = AW with the N x N 
block-diagonal matrix W with blocks W^, we conclude that 
c = Wx is block-sparse and — thanks to the invertibility 
of the — of the same block-sparsity level as x, i.e., 
orthogonalization preserves the block-sparsity level. It is easy 
to see that the definition of block-coherence in (|6]l is invariant 
to the choice of orthonormal basis A[£] for 1Z(A[£\). This is 
because any other basis has the form A[^]U£ for some unitary 
matrix Uf, and from the properties of the spectral norm 



p(M[t,r}) = p(U?M[£,r]\J r ) 



(74) 



for any unitary matrices U^,U r . Unfortunately, it seems 
difficult to derive general results on the relation between p 
before and /ib after orthogonalization. Nevertheless, we can 
establish a minimum block size d above which orthogonaliza- 
tion followed by block-sparse recovery leads to a guaranteed 
improvement in the recovery thresholds. We first note that the 
coherence /i of a dictionary consisting of N = M d elements 



in a vector space of dimension L = Rd can be lower-bounded 
as 11351 



> 



M-R 



Md^> 1 



M-R 
RMd ' 



(75) 



R(Md-l) 

Using this lower bound together with Proposition [3] and the 
fact that after orthogonalization we have v = 0, it can 
be shown that if d > RM/(M — R), then the recovery 
threshold obtained from taking block-sparsity into account 
in the orthogonalized dictionary is higher than the recovery 
threshold corresponding to conventional sparsity in the original 
dictionary. This is true irrespectively of the dictionary we 
start from as long as the dictionary satisfies the conditions 
of Proposition [Tj 

Finally, we note that finding dictionaries that lead to signifi- 
cant improvements in the recovery thresholds when exploiting 
block-sparsity seems to be a difficult design problem. For ex- 
ample, partitioning the realizations of i.i.d. Gaussian matrices 
into blocks will, in general, not lead to satisfactory results. 
Nevertheless, there do exist dictionaries where significant 
improvements are possible. Consider, for example, the pair 
of bases * = 1l and \& = F eg) Ud shown in Section |III-A| 
to achieve the lower bound in (27 1. For the corresponding 
dictionary D = [* *], we have M = 2R, ^ B = l/{dy/R), 
with the recovery threshold, assuming that block-sparsity is 
exploited, given by kd < d{yR~+ l)/2. The coherence of 
the dictionary is fi = ||vec(U c j)|| 0o /V A R. Fig. [I] obtained 
by averaging over randomly chosen unitary matrices U^, 
shows that the recovery thresholds obtained by taking block- 
sparsity into account can be significantly higher than those for 
conventional sparsity. In particular, for = 1^, we obtain 
the conventional recovery threshold as k — kd < (yR+ 1) /2, 
which allows us to conclude that exploiting block-sparsity can 
result in guaranteed recovery for a sparsity level that is d 
times higher than what would be obtained in the (conventional) 
sparse case. 

VII. Numerical Results 

The aim of this section is to quantify the improvement in the 
recovery properties of OMP and BP obtained by taking block- 




1 23456789 10 



block-sparsity level 

Fig. 2. Performance of OMP, BOMP, and BOMP-0 for a dictionary with 
L = 40, N = 400, and d = 4. 



10 




2 4 6 8 10 12 



block-sparsity level 

Fig. 3. Performance of OMP, BOMP, and BOMP-O for a dictionary with 
L = 80, N = 160, and d = 8. 



sparsity explicitly into account and performing recovery using 
BOMP and L-OPT, respectively. In all simulation examples 
below, we randomly generate dictionaries by drawing from 
i.i.d. Gaussian matrices and normalizing the resulting columns 
to 1. The dictionary is divided into consecutive blocks of 
length d. The sparse vector to be recovered has i.i.d. Gaussian 
entries on the randomly chosen support set (according to a 
uniform prior). 

In Figs. [2] and [3] we plot the recovery success rat^j as 
a function of the block-sparsity level of the signal to be 
recovered. For each block-sparsity level we average over 1000 
pairs of realizations of the dictionary and the block-sparse 
signal. We can see that BOMP outperforms OMP significantly 
and BOMP with orthogonalized blocks, denoted as BOMP- 
O, yields slightly better performance than BOMP. We also 
evaluate the performance of L-OPT compared to BP, as well 
as L-OPT run on orthogonalized blocks, termed L-OPT-O. 
For each block-sparsity level we average over 200 pairs of 
realizations of the dictionary and the block-sparse signal. The 
corresponding results, depicted in Figs. |4] and [5] show that 
L-OPT outperforms BP, and L-OPT-0 slightly outperforms 
L-OPT. Furthermore, we can see that BOMP-O significantly 
outperforms L-OPT-O. 

VIII. Conclusion 

This paper extends the concepts of uncertainty relations, 
coherence, and recovery thresholds for matching pursuit and 
basis pursuit to the case of sparse signals that have additional 
structure, namely block-sparsity. The extension is made pos- 
sible by an appropriate definition of block-coherence. 

The motivation for considering block-sparse signals is two- 
fold. First, in many applications the nonzero elements of 
sparse vectors tend to cluster in blocks; several examples 
are given in 1141 . Second, it is shown in |14| that sampling 
problems over unions of subspaces can be converted into 
block-sparse recovery problems. Specifically, this is true when 
the union has a direct-sum decomposition, which is the case 

2 Success is declared if the recovered vector is within a certain small 
Euclidean distance of the original vector. 




Fig. 4. 
with L 



block-sparsity level 



Performance of BP, L-OPT, L-OPT-O, and BOMP-O for a dictionary 
= 40, N = 400, and d = 4. 



in many applications including multiband signals |26|, |16|, 
El, lfT8l . Reducing union of subspaces problems to block- 
sparse recovery problems allows for the first general class of 
concrete recovery methods for union of subspace problems. 
This was the main contribution of lfl4ll together with equiv- 
alence and robustness proofs for L-OPT based on a suitably 
modified definition of the restricted isometry property. Here, 
we complement this contribution by developing similar results 
using the concept of block-coherence. 



Appendix A 
Proof of LemmaQ] 



We first prove (|46 



lAxll 



max 

3 



^2A\j,i]x[i\ 

i 

< max^||A[j,i]x[i]|| 2 

i 

< max V ||x[i]|| 2 p(A[j,i]) 

3 . 
1 

< ||x|| 2 ,oomax^p(A[j,i]). 



(76) 
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block-sparsity level 

Fig. 5. Performance of BP, L-OPT, L-OPT-O, and BOMP-0 for a dictionary 
with L = 80, N = 160, and d = 8. 



Therefore, for any x € C N with x ^ 0, we have 



|Ax| 



2,oo 



l x l|2,oo 



< P,(A) 



(77) 



which establishes (46 1. The proof of (47 i is similar 



|Ax| 



2,1 — 



E 



53a[?,«]x[*i 

< ^^||Ab-,,]xW|| 2 

< ^||xW|| 2 ^p(A[j,z]) 

» j 

< Pc(A)||x|| 2 ,i (78) 

from which the result follows. Finally, we have p c (A H ) = 

Appendix B 
Proof of Lemma[2] 

Nonnegativity and positivity follow immediately from the 
fact that the spectral norm is a matrix norm l32l p. 295]. 
Homogeneity follows by noting that 

p c (aA) = max^ p(aA[£, r]) 

i 

= max^ \a\p(A[e,r}) 

i 

= MPc(A). (79) 
The triangle inequality is obtained as follows: 

/9 C (A + B) = max^p(A[^r]+B[£,r]) 

i 

< jm^p(A[^r])+j;p(B[^])j 

< maxV^A^r]) +maxV/i(B[£,r]) 

r e r i 

= Pc (A) + p c (B) 



where the first inequality is a consequence of the spectral norm 
satisfying the triangle inequality. 

Finally, to verify submultiplicativity, note that, 



p c (AB) = maxp c (AB[f 
Therefore, if we prove that 

p c (AB[e}) < Pc (A)p c (B[ 



(80) 



(81) 



the result follows from ( 80 » and the fact that maxf p c (B[ 
Pc(B). 



To prove ( 81 1, note that 

/9c (ABM)=^pj^A[i,i]B[i^] 
<^^p(A[i,i]Bb',£]) 

i 3 

<EE^ A M)/>(BM (82) 

< i 

where we used the triangle inequality for, and the submulti- 
plicativity of, the spectral norm. Now, we have 

£>(AM) <max£>(AM) = p c (A). (83) 

Substituting into ( [82) yields 

p c (AB[£]) < Pc (A)^p(B[j^]) = Pc (A) Pc (B[£}) (84) 

which completes the proof. 

Appendix C 
Proof of Lemma[3] 

The proof of the statement || Av|| 2,1 < p c (A) || v||2,i 
follows directly from |78| ) by replacing A by an I x (kd) 
matrix and x e by v e C kd with \\v[l]\\ 2 > 0, for 
all i. If the flj = ; p(A[7, i]) are not all equal, then the 
last inequality in (78 1 is strict. Since a; = p c (AJj) the result 
follows. 



References 

[1] E. J. Candes, J. K. Romberg, and T. Tao, "Robust uncertainty principles: 
Exact signal reconstruction from highly incomplete frequency informa- 
tion," IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489-509, Feb. 2006. 

[2] D. L. Donoho, "Compressed sensing," IEEE Trans. Inf. Theory, vol. 52, 
no. 4, pp. 1289-1306, Apr. 2006. 

[3] G. Davis, S. Mallat, and M. Avellaneda, "Adaptive greedy approxima- 
tions," Constr. Approx., vol. 13, no. 1, pp. 57-98, 1997. 

[4] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle 
regression," Ann. Statist., vol. 32, no. 2, pp. 407^99, 2004. 

[5] J. Tropp, "Greed is good: Algorithmic results for sparse approximation," 
IEEE Trans. Inf. Theory, vol. 50, no. 10, pp. 2231-2242, Oct. 2004. 

[6] E. J. Candes and T. Tao, "Decoding by linear programming," IEEE 
Trans. Inf. Theory, vol. 51, no. 12, pp. 4203^215, Dec. 2005. 

[7] S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition 
by basis pursuit," SIAM J. Sci. Comput., vol. 20, no. 1, pp. 33-61, 1999. 

[8] S. G. Mallat and Z. Zhang, "Matching pursuits and time-frequency 
dictionaries," IEEE Trans. Sig. Proc, vol. 41, no. 12, pp. 3397-3415, 
Dec. 1993. 

[9] E. J. Candes, "The restricted isometry property and its implications for 
compressed sensing," C. R. Acad. Sci. Paris, Ser. I, vol. 346, pp. 589- 
592, 2008. 



12 



[10] D. L. Donoho and X. Huo, "Uncertainty principles and ideal atomic 
decompositions," IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 2845- 
2862, Nov. 2001. 

[11] M. Elad and A. M. Bruckstein, "A generalized uncertainty principle and 
sparse representation in pairs of bases," IEEE Trans. Inf. Theory, vol. 48, 
no. 9, pp. 2558-2567, Sept. 2002. 

[12] D. L. Donoho and M. Elad, "Optimally sparse representation in general 
(nonorthogonal) dictionaries via I 1 minimization," Proc. Natl. Acad. Sci., 
vol. 100, no. 5, pp. 2197-2202, Mar. 2003. 

[13] Y. C. Eldar, "Uncertainty relations for analog signals," IEEE Trans. Inf. 
Theory, Sept. 2008, submitted. 

[14] Y. C. Eldar and M. Mishali, "Robust recovery of signals from a struc- 
tured union of subspaces," IEEE Trans. Inf. Theory, 2008, submitted. 

[15] , "Block-sparsity and sampling over a union of subspaces," to 

appear in DSP2009. 

[16] M. Mishali and Y. C. Eldar, "Blind multi-band signal reconstruction: 
Compressed sensing for analog signals," IEEE Trans. Sig. Proc., vol. 57, 
no. 3, pp. 993-1009, Mar. 2009. 

[17] , "From theory to practice: Sub-Nyquist sampling of sparse wide- 
band analog signals," arXiv 0902.4291; submitted to IEEE Journal of 
Selected Topics in Signal Processing, 2009. 

[18] H. J. Landau, "Necessary density conditions for sampling and interpola- 
tion of certain entire functions," Acta Math., vol. 1 17, no. 1, pp. 37-52, 
1967. 

[19] F. Parvaresh, H. Vikalo, S. Misra, and B. Hassibi, "Recovering sparse 

signals using sparse measurement matrices in compressed DNA microar- 

rays," IEEE Journal of Selected Topics in Signal Processing, vol. 2, 

no. 3, pp. 275-285, Jun. 2008. 
[20] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, "Sparse 

solutions to linear inverse problems with multiple measurement vectors," 

IEEE Trans. Sig. Proc, vol. 53, no. 7, pp. 2477-2488, Jul. 2005. 
[21] J. Chen and X. Huo, "Theoretical results on sparse representations of 

multiple-measurement vectors," IEEE Trans. Sig. Proc, vol. 54, no. 12, 

pp. 4634-4643, Dec. 2006. 
[22] M. Mishali and Y. C. Eldar, "Reduce and boost: Recovering arbitrary 

sets of jointly sparse vectors," IEEE Trans. Sig. Proc, vol. 56, no. 10, 

pp. 4692^702, Oct. 2008. 
[23] Y. C. Eldar and H. Rauhut, "Average case analysis of multichannel sparse 

recovery using convex relaxation," submitted to IEEE Trans. Inf. Theory. 
[24] Y. M. Lu and M. N. Do, "Sampling signals from a union of subspaces," 

IEEE Sig. Proc. Mag., vol. 25, no. 2, pp. 41^47, Mar. 2008. 
[25] T. Blumensath and M. E. Davies, "Sampling theorems for signals from 

the union of finite-dimensional linear subspaces," IEEE Trans. Inf. 

Theory, vol. 55, no. 4, pp. 1872-1882, Apr. 2009. 
[26] Y. C. Eldar, "Compressed sensing of analog signals in shift-invariant 

spaces," to appear in IEEE Trans. Sig. Proc. 
[27] M. Stojnic, F. Parvaresh, and B. Hassibi, "On the reconstruction of 

block-sparse signals with an optimal number of measurements," IEEE 

Trans. Sig. Proc, 2009, to appear. 
[28] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, "Model-based 

compressive sensing," IEEE Trans. Inf. Theory, 2008, submitted. 
[29] D. Needell and J. A. Tropp, "CoSaMP: Iterative signal recovery from 

incomplete and inaccurate samples," Applied and Computational Har- 
monic Analysis, vol. 26, pp. 301-321, 2009. 
[30] T. Blumensath and M. E. Davies, "Iterative hard thresholding for 

compressed sensing," Jan. 2009, submitted to Elsevier. 
[31] L. Peotta and P. Vandergheynst, "Matching pursuit with block incoherent 

dictionaries," IEEE Trans. Sig. Proc, vol. 55, no. 9, pp. 4549^1557, 

2007. 

[32] R. A. Horn and C. R. Johnson, Matrix Analysis. New York, NY: 
Cambridge Press, 1985. 

[33] R. Gribonval and P. Vandergheynst, "On the exponential convergence of 
matching pursuits in quasi-incoherent dictionaries," IEEE Trans. Inform. 
Theory, vol. 52, no. 1, pp. 255-261, Jan. 2006. 

[34] R. A. DeVore and V. N. Temlyakov, "Some remarks on greedy algo- 
rithms," Adv. in Comp. Math., vol. 5, pp. 173-187, 1996. 

[35] T. Strohmer and R. W. Heath Jr., "Grassmannian frames with appli- 
cations to coding and communication," Applied and Computational 
Harmonic Analysis, vol. 14, pp. 257-275, 2003. 



